Estimating the Selectivity of XML Path Expression with Predicates by Histograms
نویسندگان
چکیده
Selectivity estimation of path expressions in querying XML data plays an important role in query optimization. A path expression may contain multiple branches with predicates, each of which having its impact on the selectivity of the entire query. In this paper, we propose a novel method based on 2-dimensional value histograms to estimate the selectivity of path expressions embedded with predicates. The value histograms capture the correlation between the structures and the values in the XML data. We define a set of operations on the value histograms as well as on the traditional histograms that capture nodes positional distribution. We then construct a cost tree based on such operations. The selectivity of any node (or branch) in a path expression can be estimated by executing the cost tree. Compared with previous methods (which ignore value distribution) our method offers much better estimation accuracy.
منابع مشابه
Structure and Value Synopses for XML Data Graphs
All existing proposals for querying XML (e.g., XQuery) rely on a pattern-specification language that allows (1) path navigation and branching through the label structure of the XML data graph, and (2) predicates on the values of specific path/branch nodes, in order to reach the desired data elements. Optimizing such queries depends crucially on the existence of concise synopsis structures that ...
متن کاملCost Estimation Techniques for Database Systems
This dissertation is about developing advanced selectivity and cost estimation techniques for query optimization in database systems. It addresses the following three issues related to current trends in database research: estimating the cost of spatial selections, building histograms without looking at data, and estimating the selectivity of XML path expressions. The first part of this disserta...
متن کاملFractional XSketch Synopses for XML Databases
A key step in the optimization of declarative queries over XML data is estimating the selectivity of path expressions, i.e., the number of elements reached by a specific navigation pattern through the XML data graph. Recent studies have introduced XSketch structural graph synopses as an effective, space-efficient tool for the compile-time estimation of complex path-expression selectivities over...
متن کاملOn-Line Selectivity Estimation for XML Path Expressions using Markov Histograms
The extensible mark-up language (XML) is gaining widespread use as a format for data exchange and storage on the World Wide Web. Queries over XML data require accurate selectivity estimation of path expressions in order to optimize query execution plans. Selectivity estimation of XML path expression is usually done based on summary statistics about the structure of the underlying XML repository...
متن کاملEstimating the Selectivity of XML Path Expressions for Internet Scale Applications
Data on the Internet is increasingly presented in XML format. This enables novel applications that pose queries over “all the XML data on the Internet.” Queries over XML data use path expressions to navigate through the structure of the data, and optimizing these queries requires estimating the selectivity of these path expressions. In this paper, we propose two techniques for estimating the se...
متن کامل